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Abstract 



We consider the issue of reporting the result of search experiment in the most 
unbiased and efficient way, i.e. in a way which allows an easy interpretation and 
combination of results and which do not depend on whether the experimenters 
believe or not to having found the searched-for effect. Since this work uses the 
language of Bayesian theory, to which most physicists are not used, we find that 
it could be useful to practitioners to have in a single paper a simple presentation 
of Bayesian inference, together with an example of application of it in search of 
rare processes. 
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1 Introduction 

An often debated issue0 in frontier science research is how to report results ob- 
tained from search experiments at the limit of the detector sensitivity. Sometimes re- 
searchers have simply to state a clear null result, i.e. when all members of the experimen- 
tal team agree that no new phenomenon is indicated by the data. At other times they 
may have some hints that the data could indicate the presence of the searched-for signal, 
as a result of a more or less pronounced excess of events above the expected background 
level. In lucky, and rare, cases new phenomena are seen in such a spectacular way that all 
researchers agree and everybody is convinced. Clearly, reporting the result may become 
problematic in the second case. "The experiment was inconclusive, and we had to use 
statistics" , somebody once said. 

The purpose of this paper is to show how results of the search for rare phenomena 
can be presented, in order to best use the information contained in the experimental 
data, i.e. in the most powerful and unbiased way. Since the three situations sketched out 
above are, in reality, never so sharply separated, the presentation of the result should 
not depend on whether researchers feel that their case is a negative, doubtful, or positive 
one. Moreover, it is important that the pieces of evidence from different experiments can 
be combined in the most efficient way. If, for example, many independent data sets each 
provide a little evidence in favour of the searched-for signal, the combination of all data 
should enhance that hypothesis. If, instead, the indications provided by the different data 
sets are incoherent, their combination should provide a stronger constraint on the intensity 
of the postulated process. 

Typical fields of research in which the above described problematic situation arises 
are, to give a few examples, neutrino oscillations, rare decays, new particles, gravitational 
waves, and dark matter. All these processes have in common the fact that, under station- 
arity of the search conditions, the physical process can be modelled with high accuracy by 
a Poisson process, and the physical quantity (a mass, a cross-section, a branching ratio, 
a rate, etc.) of interest will be related to the intensity r of that process. 

Although the methods described in this paper are of general use, we think that they 
can be better understood by way of a case study. We consider the problem of inferring the 
rate of bursts of gravitational waves (g.w.) on Earth. This case presents typical features 
common to other frontier searches, but the problem remains unidimensional, since only 
one quantity is inferred]^ and thus easy to describe. The extension to higher dimensions 

1) The many recent papers |, |, |, |, |, §, 0, |, |, 0, |l|, |l||l|, 0, on 'limits', not to mention 

notes internal to experimental teams, give an idea of the present interest in the subject. However, 
this article is not a review of the various 'prescriptions' suggested by the many authors involved in 
the discussion. In fact, the point of view presented in this paper is that the search of the Holy Grail 
containing the unique and objective prescription to calculate limits is a false problem. Since the cited 
papers — with the exception of Ref. Q and, to some extent, Ref. — are written in this spirit, they 
are irrelevant for this work. Another common point of all the cited authors, with the sole exception of 
Ref. |§ , is to consider the frequentist concept of coverage as good guidance. Zech ||] considers coverage 
"a magic objective of classical confidence bounds. It has an attractive property from a purely aesthetic 
point of view but it is not obvious how to make use of this concept". Our opinion about frequentistic 
coverage is even more severe, and it has been discussed extensively in Ref. [^. Moreover, comments 
on Refs. js) and which have triggered most of the cited papers, can be found in Refs. (is) and |]l9| . 
In particular, one should not overlook the fact that the results obtained by frequentistic confidence 
intervals, as well as those obtained by frequentistic hypothesis tests, are usually misunderstood and 
might even induce researchers to draw misleading scientific conclusions . 

For example, in neutrino oscillation search the results are given in terms of the mixing angle and of 
the mass-squared difference. Note that, also in this case, it would be very interesting to have the result 
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is, at least conceptually, straightforward. 

Since the methods used in the paper are based on Bayesian inference and subjective 
probability, to which most researchers are at present not accustomed, we feel that it is 
necessary to introduce this matter in an extensive and elementary way. In particular, we 
also think it is important to clarify some of the philosophical aspects which physicists tend 
to ignore, but which are crucial to the understanding and acceptance of the inferential 
framework which will be used. Therefore, we consider it is convenient for the reader to 
have a description of the case study and of the inferential framework in an almost self- 
contained article.0 

The paper is structured in the following way. In the next section we recall the 
present status and future prospects of g.w. burst search, stressing some of the important 
aspects of the experiments which affect our general considerations about the analysis 
strategy. Then, the inferential framework to be used to report results will be presented 
and discussed in depth, though remaining at an introductory level. In the core of the 
paper the inferential model will be applied to the case study, with general considerations 
and numerical examples. Finally some conclusions will be drawn. 



2 Gravitational-wave burst search 
2.1 Status and perspective 

The interest in g.w.'s is related to the astrophysical information they contain and 
also to the implications their detection would have for fundamental physics ||2^. Their 



detection would, in fact, lead to confirmation of Einstein's general relativity predictions in 
a more direct way than Hulse and Taylor's observations Gravitational- wave detectors 
could provide a direct measure of the waves and could also test their properties. In par- 
ticular, using a network of detectors, wave speed and polarization state can be inferred. 
Regarding the emission process, the importance of g.w.'s lies in the fact that they pass 
through matter without being significantly absorbed or scattered, unlike electromagnetic 
waves and even weakly interacting neutrinos. Thus the information about the emission 
process carried by g.w.'s is really unique (see Ref. |^| for a review of g.w. sources). 



At present, one of the most interesting activities within that section of the com- 
munity which is operating resonant antennae is the search for evidence of g.w. bursts. 
These are defined as bunches of g.w.'s whose time width is smaller than the time constant 
of the detectors (the latter being typically of the order of milliseconds) . Thus their energy 
spread is expected to be flat across the whole of the frequency bandwidth of the detector. 
A burst of g.w.'s can be produced in a gravitational collapse associated with supernova 
explosions, or during the final stage of the coalescence of binary systems (neutron stars, 
black holes), or in processes involving massive black holes, such as the capture of a near 
body. 

Astrophysical estimates of rates and signal amplitudes for these processes on Earth 
would seem discouraging in the light of the present theoretical ideas. In fact, given the 
sensitivity of the present antennae, the expected rates are much too low to give a sizable 
excess of candidate events above the expected background. The available detectors could, 
in fact, detect a g.w. burst from the Galaxy if a process radiated 1% of a solar mass 



in terms of cross-section of the process searched for, kept separate from the interpretation in terms of 
the postulated osciUations. Then, one would deal also in this case with unidimensional problems, i.e. 
cross-sections for bins of the incoming neutrino energy. 

Brief and extensive physicist's introductions to subjective probability and Bayesian inference can be 
found in Refs. pG| and E^, respectively. 



3 



[Mq), which would yield a dimensionless wave amplitude on Earth of 10^^^ [26[] . 
However, the expected supernova rate in the Galaxy could be in the range of one per 10- 
100 years (see e.g. Ref. and an emission of 1% Mq into g.w.'s seems quite improbable. 
Nevertheless, there is no solid ground for supposing that this hypothesized fraction of 



energy is released into g.w.'s and even larger fractions are conceivable |]22[. Moreover, 
the burst rate could increase by a factor of about 1000 if the antennae were sensitive 
to astronomical events within a distance of 10 Mpc, thus including the Virgo cluster. 



Improved bar detectors ||28|, |2^, as well as planned interferometers [0, are expected to 
reach this level of sensitivity. 

In conclusion, although current prospects are not encouraging, the many uncer- 
tainties on the physics processes involved might still mean that surprises are in store 
and for this reason it is important to be prepared to exploit to the full the information 
provided by operating and planned detectors. 

2.2 Search strategy 

Gravitational-wave bursts are very weak signals, embedded in the noise of the 
detector. Thus, they can be extracted from the detector data by proper filtering, optimized 
to increase the signal-to-noise ratio (SNR) for this class of events [31]. The analysis is very 
difficult because of the low SNR, the rarity of the events, the uncertainty regarding their 



shape, and the non- stationary noise of the detectors In fact, although some of the 
sources of noise, like narrow-band Brownian noise and electronic wide-band noise, are well 
understood and their expectations can be modelled with reasonable accuracy, there are 
other sources of background which are not easy to handle, and not even easy to recognize. 

A filter for g.w. burst search is optimized to increase the SNR for 5-like signals, and 
a candidate event is defined when the filtered signal exceeds a certain energy threshold. 
The candidate event is characterized by energy, arrival time, above-threshold duration. 



and other relevant quantities related to the spectral content in different bandwidths ||32|| . 
All event characteristics are, in fact, important. For example, the shape of the electric 
signal coming from the transductor can be used to discriminate g.w. bursts from back- 
ground. 

The rarity of the events looked for and the presence of irreducible background make 
it impossible to do this search using a single detector, even if seismic, electromagnetic and 
other sensors are often used to veto the data of a g.w. detector (see e.g. Refs. ||3^, ^). 
Therefore, a coincidence among at least two parallel and distant detectors is required.^ 
Gravitational- wave bursts are, in fact, supposed to irradiate the Earth uniformly so that 
detectors spread out across the Earth's surface should be able to detect g.w.'s related to 
the same physical event. Hence the whole analysis procedure consists of data filtering, 
event selection, vetoes when necessary, and the final coincidence analysis. 

An important parameter of the procedure for extracting g.w. burst candidates 
is the coincidence window, that is the time width within which the coincidences are 
considered. The window can be fixed by considering the physics of the process and the 



characteristics of the apparatus Mi 



At present five resonant g.w. antennae are in operation, and this is really the first 
time that it is possible to search for g.w.'s with such a high number of detectors working 



simultaneously: Explorer [Bl], NAUTILUS El and AURIGA pi in Italy; Allegro in 



'Parallel' means oriented in such a way as to be sensitive to the same polarization and direction of the 
incoming wave, and 'far' means located at a distance such that the long-range correlated background 
is considered to be negligible. 
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USA; Niobe |^ in Australia. A collaboration has been established between the experi- 
mental groups, with the aim of performing coincidence searches of g.w. bursts. The joint 
effort resulted in 1997 in a data exchange protocol, in which candidate events are precisely 
defined and the procedure for exchanging data was agreed (see e.g. Ref. and related 
web sites). It is, then, important to agree on an optimal way for publishing results such 
that all information contained in the data can be used in the most efficient way. 

Coincidence experiment procedures are essentially the same as those used since 
the beginning of g.w. experiments [BH]. Recently they have been used for the analysis of 



Explorer and Allegro 1991 data ^ and analyses of Explorer-NAUTILUS (1994-1996) 
and Explorer-Niobe (1995) [^. The only relevant background to coincidence analysis is 
due to accidental coincidences, which can be estimated with high accuracy by off-timing 
techniques. 

For the sake of simplicity we consider here only coincidences between a couple of 
parallel detectors. The rate of background (r?,) due to the accidental coincidences between 
the candidate events is usually evaluated by the average of the coincidences at shifted 
times. Alternatively, one can make use of individual background rates (r,^ and rjj) and 
of the coincidence window w to evaluate as r?, = Ti^Ti^w. The two estimations of 



the expected accidental coincidence rate usually give the same result |3J]. Once r^, is 
evaluated, the observed number of coincidences in a given observation time T due to 
background is described by a Poisson distribution. This is because accidental coincidences 
fulfil the conditions which define a Poisson process, if the noise is stationary during T. 
Therefore, the observed frequency distribution of off-timing coincidences is expected to 
be very close to the Poisson probability distribution of parameter A;, = r^T. As the 
distributions actually observed are indeed of that kind, researchers are highly confident 
about the probability distribution of background coincidences. 

3 Probability of accidental coincidences versus probability of burst rates 

Let us begin by illustrating the kind of problems that can arise in interpreting 
results of coincidence experiments, if not properly stated. Let us imagine that Uc coinci- 
dence events have been observed during the effective observation time T. The probability 
of observing ric events, given a Poisson process of intensity r;,, is 

Pine I n) = . (1) 

Two remarks are now in order. First, one should be very careful about calling P{nc \ rh) 
the 'probability of the observed number of coincidences', because what has been observed 
is sure and no longer belongs to the domain of the uncertain, to which probability applies 
(the certain event has probability 1). P{nc \ r^) is, instead, the probability of observing the 
hypothetical number of coincidences ric, under the condition that the stochastic process 
is described by a Poisson of constant and precisely known intensity r?, during the obser- 
vation time T. Second, P{nc \ Vb) does not provide, by itself, a result concerning what the 
researchers are interested in, i.e. the rate of g.w. bursts. In fact, P{nc \ Vf,) is a probabilistic 
statement about the possible outcome Uc, and not about the uncertain rate of g.w. bursts. 
However, it is rather intuitive that, if the observed number of coincidences is of the order of 
the expected value of the background, i.e. Uc ~ rhT±y/rhT, the background is considered 
to describe the outcome of the experiment well, while if (ric — VbT)/ y/rf/T S> 1, suspicion 
is raised that some of the observed coincidences could be attributed to g.w. bursts (or, 
more precisely, to any other physical effect not considered as background). In this paper 
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we consider that the only hypothesized but known background is a contribution of g.w. 
bursts. As a consequence, there will be g.w. burst rates in which we believe more, others 
in which we believe less, and others that we rule out. In other words, we are faced with 
an inferential problem, which must treated with care to avoid reporting the result in a 
way which might be misleading (see e.g. examples given in Ref. [Il8|). 

To give a numerical example, let us take the case of Af, = r?, ■ T = 100, and let 
us assume that 130 coincidences have been observed. The probabihty of ric = 130, given 
Xb = 100, is 6 ■ 10"'^, but one should not say that 'there is a probability of 6 x 10""^ that the 
data come from the background'. In fact this would imply that, 'with 99.94 % probability, 
the data do not come from background', i.e. 'they have to be attributed almost certainly 
to a genuine signal'. Indeed, the observation of a low-probability event does not imply 
that the hypothesis considered to be the cause of it (the so-called 'null hypothesis' Ho, in 
our case Ho = 'background is the only source of candidate events') has to be ruled out. 

One can recognize, behind the logic of standard hypothesis tests with which we 
are all familiar, a revised version of the classical proof by contradiction. In standard 
dialectics, one assumes a hypothesis to be true, then looks for a logical consequence which 
is manifestly false, in order to reject the hypothesis. The 'slight' difference introduced in 
the 'classical' statistical tests is that the false consequence is replaced by an improbable 
one. The argument might seem convincing at first sight, but it has no logical grounds. In 
fact, no matter how small the probability is, whatever is observed is not in contradiction 
with the null hypothesis, unless it is really impossible. This becomes self-evident when 
the probability of whatever can be observed is so small that this kind of reasoning would 
rule out Ho whatever one observes. For example, in our numerical example even P{nc = 
100 I Xb = 100) = 4 % is below the standard probability level under which an event 
is declared 'improbable'. This is the reason why statisticians have invented 'p- values', 
i.e. 'probability of the tail(s)' (see e.g. Ref. [^]). For example, one would say, in our 
case, that the reason why the data are against the null hypothesis is not simply because 
P{n^ = 130 I Xb = 100) = 6 ■ 10-^ but because P{n^ > 130 | = 100) = 0.23 %. But this 
does not solve the problem, it makes it worse because one is considering the conditional 
probability of not only what has actually been observed, but also what has not been 
observed (see e.g. Refs. 12^, |4l|). 

Although we cannot presume to have been fully convincing with these very brief 
critical remarks and therefore refer the reader to more general discussions on the subject 
(see e.g. Ref. and references therein), the message is that one is not allowed to evaluate 
the probability of an effect (or, even worse, the probability of an effect plus that of all 
rarer effects not actually observed), given a certain cause, and then to consider it as if it 
were the probability of the cause itself. 

Some readers might wonder why this paper is making such a big deal about the 
criticism expressed above, which after all seems to be founded on intuition, logic and good 
sense. The reason is that the standard education of physicists on the subject of probabil- 
ity is based on a very peculiar and unnatural point of view (frequentism) which prevents 
probability of causes, i.e. what Poincare calls 'the essential problem of the experimental 
method' [|2|, being talked about. However, despite their education, physicists constantly 
make use of this concept, most of the time correctly, as happens in simple routine ap- 
plications. But sometimes the combination of good intuition and unsuitable statistical 
approach yields wrong conclusions, as reported, e.g., in Ref. [0. Given this situation, we 
think that there is a strong probability that what we are going to say about the way of 
reporting results will be misunderstood, if it is not clear what is meant by probability of 
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causes (or of hypotheses, or of true values) and how this can be evaluated on the basis of 
all available knowledge. Therefore, for the convenience of the reader, in the next section 
we give a short introduction to the problem of inference, extracted from Ref. and 
adapted to this context. 



4 From data to true values 

4.1 Learning from observations: creating or modifying knowledge? 

Every measurement is made with the purpose of increasing the knowledge of the 
person who performs it, and of anybody else who may be interested in it, like the members 
of a scientific community. It is clear that the need to perform a measurement indicates 
that one is in a state of uncertainty with respect to something, e.g. the value of a well- 
defined physics quantity. In all cases, the measurement has the purpose of modifying a 
given state of knowledge. One might be tempted to say 'acquire', instead of 'modify', the 
state of knowledge, thus indicating that the knowledge could be created from nothing by 
the act of measuring. However, it is not difficult to see that, in all cases, what we are 
dealing with is just an updating process, in the light of new facts and of some reason. 
To give an example about which everyone has good intuition, let us take the case of the 
measurement of the temperature in a room, using a digital thermometer (just to avoid 
uncertainty in the reading), and let us suppose that we get 21.7°C. Although we may 
be uncertain about the tenths of a degree, there is no doubt that the measurement will 
have narrowed the interval of temperatures considered possible before the measurement: 
those compatible with the physiological feeling of a comfortable environment. According 
to our knowledge of the thermometer used, or of thermometers in general, there will be 
values of temperature in a given interval around 21.7 °C in which we believe more and 
values outside the interval in which we believe less. It is, however, also clear that if the 
thermometer had indicated, for the same physiological feeling, 17.3 °C, we might suspect 
that it was not well calibrated, while if it had indicated 2.5 °C we would have no doubt 
that the instrument was not working properly. 

The three cases correspond to three different degrees of modification of the knowl- 
edge. In particular, in the last case the modification is null (but even in this case we have 
learned something: the thermometer does not work!). 

So, what makes us improve our knowledge after an empirical observation, is not the 
observation alone, but the observation framed in prior knowledge about measurand and 
measurement. Trained physicists always have such prior knowledge and often use it un- 
consciously. Imagine someone who has no scientific or technical education at all, entering 
a physics laboratory and reading a number on an instrument: His scientific knowledge will 
not improve at all, apart from the triviality that a given instrument displayed a number 
(not much of a contribution to knowledge!) 

4.2 From the probability of the observables to the probability of the true 
values 

Summarizing the argument so far, after having performed an experiment, which 
has resulted in the observed value x being read on an instrument, there are some values 
of the physical quantity (generically indicated by n) in which we believe more (we say 
'they are more probable') and some others in which we believe less. The different possible 
true values can be characterized by a p.d.f. /(/i | a;), conditioned by the observation x. To 
be more precise, /(/i) depends on many other pieces of information, like knowledge of the 
instruments, of the kind of measurement, and of reasonable values of fi to be expected. 
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So using K() to indicate the 'knowledge', to be precise, we should write 



/(/i I x) — > f{fi I X, K(instr.), K(meas.), K(/i)) , 

although one often simply writes | x), or even /(/i), implicitly assuming the conditions. 

Clearly, f{fi\x) cannot be evaluated as relative frequencies of a long-run experi- 
ment. It would be absurd to imagine a distribution of the values of /i for a given value x 
read on the instrument, as true values are not directly observable, being related to abstract 
concepts. Instead, relative frequencies can be used to evaluate the detector response: 

/(x I yu) = /(x I /i, K(instr.), K(meas.)). 

This can be done either by calibration with respect to a reference value or by Monte Carlo 
simulation, as is currently done when n refers to a quantity for which a calibration cannot 
be made.0 More often, /(x|/i) is evaluated by reasonable assumptions, like when we 
assume a Gaussian model of error distribution, or that the observed number of accidental 
coincidences is described by a Poisson distribution. For example, one has to remember 
that, no matter how much the off-timing distribution might be Poisson-like, this empirical 
observation cannot be considered, strictly speaking, a proof, having the same strength as 
mathematical theorem. Nevertheless, the fact that this observation will lead practically 
all researchers to believe a certain hypothesis makes it a typical example of the type of 
inferential process we are talking about. 

The function f{x\fi) is usually called likelihood, since it quantifies how likely it is 
that fi will produce any given x. Note that this function can be easily misinterpreted: As 
a probability density function it is a function of x, since it describes the beliefs on x for a 
given value of /i. As a mathematical function it is also a function of fi in the sense that the 
p.d.f. f{x I fi) depends on the 'parameter' /i. However, it is not correct to say that f{x \ fj,) 
measures the belief that x comes from /i (in the sense that the observable x has to be 
attributed to the true value /i). Instead, this degree of belief is denoted by f{fi\x). The 
confusion^ between f{fi\x) and f{x\n) is a source of really terrible mistakes [T^, . 

So, the problem is how to get from the observation x to /(/i). Before going to the 
formal derivation of the formula to update the beliefs, let us try to justify intuitively the 
general rule, considering what happens when fi can assume only two values. If they seem 
to us equally possible, it is natural to favour the value which gives the highest likelihood of 
producing x. For example, assuming fii = —1, fi2 = 10, considering a Gaussian likelihood 
with cr = 3, and having observed x = 2, one would tend to believe that the observation 
is most likely caused by fii. If, on the other hand, we add the extra information that 
the quantity of interest is positive, then /ii is no longer the most probable cause but an 
impossible one; H2 becomes certain. There are, in general, intermediate cases in which, 
because of previous knowledge, one tends to believe a priori more in one or other of the 
causes. (For example, one could imagine a small Monte Carlo in which /ii and /i2 are 
randomly chosen with probability ratio 1 to 10^: where, then, does x = 2 come from?) It 
follows that, in the light of a new observation, the degree of belief of a given value of /x 
will be proportional to 



Note that a Monte Carlo program is nothing but a summary of the most rehable behefs about the 
phenomenology and the measurement process under study. 

Anticipating what will become clear in a while: if the prior is uniform, then f{x \ ^) and /(/i | x) have 
identical mathematical expression. This is the reason why the likelihood curve is often taken as if it 
were probability information about /i. 
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— the likelihood that jj will produce the observed effect; 

— the degree of belief attributed to ^ before the observation, quantified by /o(/i) (the 
'prior'). 

We have finally 

oc/(x|/i)-/o(/i). (2) 

This is one of the ways of writing Bayes' theorem. The proportionality factor is simply 
given by the normalization of /(/i | x) to 1. At this point it is important to write Bayes' 
formula once more, making all conditions explicit: 

/(/i I X , K(instr.), K(meas.), K^(/i)) oc f{x \ fi, K(instr.), K(meas.)) ■ f{fi \ Ko(/i)) , 

where /(/i | Ko(/i)) = /o(Ai). 



4.3 Derivation of Bayes' theorem from a physicist's perspective 

The concepts illustrated in the previous section can be formalized using the fol- 
lowing reasoning. 

— Before doing the experiment we are uncertain on /i and on x: we know neither 
the true value, nor the observed value. Generally speaking, this uncertainty is 
quantified by f{fi,x). 

— Under the hypothesis that we observe x, we can calculate the conditional proba- 
bility 

— At this point, it seems we are stuck, because we are usually more uncertain about 
{/i, x} than about /i. However, we note that /(/x, x) can be calculated from f{x \ n) 
and /(/x): 

/(/x,x) = /(x|/i)-/(/x). (4) 

This is the key observation to solve our problem. 

— If we do an experiment we need to have a good idea of the behaviour of the 
apparatus, therefore /(x | /x) must be a narrow distribution. The most uncertain 
contribution remains the prior knowledge about /i, quantified by fo{^^) (the sub- 
script is to remind us that this is a prior about /i). Note that it is all right that 
/o(/x) is rather broad ('vague'), because we want to learn about /i itself, performing 
an experiment with an apparatus having a narrow response around true values. 

— Putting all the pieces together we get the standard formula of Bayes' theorem for 
uncertain quantities: 



fix 


1/^) 


■ /o(/i) 


Ifix\ 


/i) ■ 


foifi) d/i 



The steps followed in this proof of the theorem should convince the reader that /(/i | x) 
calculated in this way is the most we can say about /i with the given state of information. 

One may be worried about the presence of /o(/i) in the result; but this is simply 
unavoidable, and for this reason we should be relaxed about it |T9[. /o(/x) becomes ir- 
relevant in routine cases because the likelihood is usually very narrow (when seen as a 
mathematical function of fi) with respect to /o(yu), such that the prior is reabsorbed in 
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the normalization factor.Q In contrast, in the sophisticated frontier- science measurements 
/o(/i) does matter, as will be illustrated below. 

Let us conclude this very short introduction on Bayesian inference. 

— It is impossible to give a probabilistic result on a physics quantity without passing 
through priors ("it is impossible to make the inferential omelette without breaking 
the Bayesian egg", some like to say). 

— When the probabilistic result seems not to depend on priors, we are in a condition 
(or we make the tacit assumption!) in which the prior distribution acts as a con- 
stant; this approximation is very good when the likelihood is much narrower than 
the prior, as usually happens in routine measurements. 

— In frontier science, priors must be considered with much care, and this will be the 
main task of this paper. 



5 Inferring gravitational-wave burst rate 
5.1 Modelling the inferential process 

Now that the inferential scheme has been set up, let us rephrase our problem in 
the language of Bayesian statistics: 

— the physical quantity of interest, and with respect to which we are in a state of 
great uncertainty, is the g.w. burst rate r; 

— we are practically sure about the expected rate of background events (but not 
about the number which will actually be observed); 

— what is certain is the number of coincidences which have been observed (stating 
that the observed number of coincidences is ± ^JrTc does not make any sense!), 
although we do not know how many of these events have to be attributed to 
background and how many (if any) to g.w. bursts. 

For a given hypothesis r the number of coincidence events which can be observed in the 
observation time T is described by a Poisson process having an intensity which is the sum 
of that due to background and that due to signal. Therefore the likelihood is 

r, , , e-(^+^'')^((r + r,)T)"^ 

and, making use of B ayes' theorem, we get 

f( \ \ e-(-+-^)^((r + r,)T)"- ^^ ^ 

t[r\nc,n)(x j /o(r). (7) 

At this point we are faced with the problem of what foi^) to choose. The best way 
of understanding why this choice can be troublesome is to illustrate the problem with 
numerical examples. Let us consider T as unit time (e.g. one month), a background rate 
Tfe such that Tfe X T = 1, and the following hypothetical observations: nc = 0; nc = 1; 
rtr. = 5. 



To make it clear, if one wants to measure the temperature in a room one does not choose a thermometer 
with an r.m.s. error of 5 degrees, because one's physiological prior is more accurate than what could 
be learned from such a measurement; but the same instrument provides useful information at 200°C 
or at -50° C. 
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Figure 1: Distribution of the values of the rate r, in units of events/month, inferred from an 
expected rate of background events r;, = 1 event /month, an initial uniform distribution foir) = k 
and the following numbers of observed events: (continuous); 1 (dashed); 5 (dotted). 



5.2 Uniform prior 

One might think that a good 'democratic' choice would be a uniform distribution 
in r, i.e. fo{f) = k. Inserting this prior in (|^ and normalizing the final distribution we 



get (see e.g. Ref. Ig) 



\ / r ^ M e-^((r + r,)T)"- 
fir I n„ n, Mr) = k) = , • 

The resulting final distributions are shown in Fig. |l|. For = and 1 the distributions are 
peaked at zero, while for nc = 5 the distribution appears so neatly separated from r = 
that it seems a convincing proof that the postulated physics process searched-for does 
exist. In the cases nc = and 1, researchers usually present the result with an upper limit 
(typically 95%), on the basis that /(r) seems compatible with no effect, as suggested by 
Fig. For example, in the simplest and well-known case of = the 95% CL. upper 
limit is 3 events/month. The usual meaning one attributes to the limit is that, if the 
physics process of interest exists, then there is a 95 % probability that its rate is below 3 
events/month, resulting from 

/ /(r I nc = 0, n = 1, Ur) = k)Ar = 0.95 . (9) 
Jo 

But there are other infinite probabilistic statements that can be derived from /(r | r;,, nc = 
0). For example, P(r > 3 events/month) = 5%, P(r > 0.1 events/month) = 90%, P{r > 
0.01 events/month) = 99%, and so on. Without doubt, researchers will not hesitate to 
publish the 95 % upper limit, but they would feel uncomfortable stating that they believe 
99% that, if the g.w. bursts exist at all, then the rate is above 0.01 events/mo nth.Q The 
reason for this uneasiness can be found in the uniform prior, which might not correspond 

If one assesses a probability value of 99 %, one should be as confident that the event will turn out to be 
true as one would be of extracting a white ball from an urn containing 99 white balls and one black. 
If this is not the case, one is, consciously or not, responsible for misinformation. So, other people, 
trusting the person who made that probability assessment, will form their opinion and make their 
decisions using a probability value which does not correspond to what that person believes. 
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to the prior knowledge that researchers really have. Let us, then, examine more closely the 
meaning of the uniform distribution and its consequences. Saying that fo{r) = k, means 
that dP/dr = k, i.e. P oc Ar; for example. 



P(0.1 < r < 1) = P(l < r < 10) = P(10 < r < 100) 



(10) 



and so on. But, taken literally, this prior is hardly ever reasonable. The problem is not 
due to the divergence for r — >■ oo which makes fo{f) not normalizable (these kinds of 
distributions are called 'improper'). This mathematical nuisance is automatically cured 
when /o(t') is multiplied by the likelihood, which, for a finite number of observed events, 
vanishes rapidly enough for r ^ oo. A much more serious problem is related to the fact 
that the uniform distribution assigns to all the infinite orders of magnitude left of 1 a 
probability which is only 1/9 of the probability of the decade between 1 and 10, or 1 % 
of the probability of the first two decades, and so on. This is the reason why, even if no 
coincidence events have been observed, the final distribution obtained from zero events 
observed (continuous curve of Fig. p implies that P{r > 1 event/month) = 37%. 

5.3 Jeffreys' prior 

A prior distribution alternative to the uniform can be based on the observation that 
what often seems uniform is not the probability per unit of r, but rather the probability 
per decade of r, i.e. researchers may feel equally uncertain about the orders of magnitudes 
of r, namely 

P(0.1 < r < 1) = P(l < r < 10) = P(10 < r < 100) .... (11) 



This implies that dP/ d In r = fc, or dP/ dr oc 1/r. This prior is known as Jeffreys' prior 
and it is indeed very interesting, at least from a very abstract point of view (though it 
tends to be misused, as is discussed in Ref. [0). If we take Jeffreys' prior literally, it does 
not work in our case either. In fact, when inserted in Ref. (^, it produces a divergence 
for r — *• 0. This is due to the infinite orders of magnitude left of 1, to each of which we 
give equal prior probability, and to the fact that the likelihood (|^) goes to a constant for 
r —y 0. Therefore, for any Tq > 0, we have P(r < ro)/P{r > To) = oo. To get a finite 
result we need a cut-off at a given r^m- 

As an exercise, just to get a feeling of both the difference with respect to the case 
of the uniform distribution, and the dependence on the cut-off, we report in Fig. ^ the 
results obtained for the same experimental conditions as Fig. |l], but with a Jeffreys' prior 
truncated at r^m = 0.1 and 0.01. One can see that the final distributions conditioned 
by or 1 events observed are pulled towards r = by the new priors, while the case of 
Uc = 5 is more robust, although it is no longer nicely separated from zero. 

5.4 Role of priors 

The strong dependence of the final distributions on the priors shown in this ex- 
ample should not be considered a bad feature, as were an artifact of Bayesian inference. 
Putting it the other way round, the Bayesian inference reproduces, in a formal way, what 
researchers already have clear in their minds as a result of intuition and experience. In the 
numerical examples we are dealing with, the dependence of the final distributions on the 
priors is just a hint of the fact that the experimental data are not so strong as to lead every 
scientist to the same conclusion (in other words, the experimental and theoretical situa- 
tion is far from the well-established one upon which intersubjectivity is based). For this 
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Figure 2: Final distributions for the same experimental configuration of Fig. |l|, but with a 
Jeffreys' prior with cut-off at rmin = 0.01 events/month (left plot) and rmin = 0.1 events/month 
(right plot). 



reason, one should worry, instead, about statistical methods which advertise 'objective' 
probabilistic results in such a critical situation. 

When the experimental situation is more solid, as for example in the case of five 
events observed out of only 0.1 expected from background, the conclusions become very 
similar, virtually independent of the priors (see Fig. |^), unless the priors reflected really 
widely differing opinions. 

The possibility that scientists might have distant and almost non-overlapping pri- 
ors, such that agreement is reached only after a huge amount of very convincing data, 
should not be overlooked, as this is, in fact, the typical situation in frontier research. 
Even 100 events observed out of 0.1 expected from background are not a logical proof of 
the existence of bursts, since the observation is not in contradiction with the background 



f 

0.2r 




Figure 3: Distribution of the values of the rate r, in units of events/month, inferred from five 
observed events, an expected rate of background events rh = 0.1 events/month, and the following 
priors: uniform distribution fair) = k (continuous); Jeffreys' prior truncated at rmin = 0.01 
(dashed). The case of the Jeffreys' priors is also reported for rf, = 1 event/month (dotted). 
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alone. Nevertheless, any reasonable physicist will agree that this is highly unlikely (we 
shall come back to evolution of beliefs in Section |7.3| .) 



5.5 Priors reflecting the positive attitude of researchers 

Having clarified the role of priors in the assessment of probabilistic statements 
about true values, and their critical infiuence on frontier-research results, it is clear that, 
in our opinion, "reference priors do not exist" [jl9|, However, we find that the "concept 
of a 'minimal informative' prior specification - appropriately defined!" can sometimes 
be useful, if the practitioner is aware of the assumptions behind the specification. 

We can now ask ourselves what would be a prior specification common to ratio- 
nal and responsible people who have planned, financed and operated frontier-type ex- 
periments. This is what we call the positive attitude of researchers pO[. Certainly, the 
researchers believed there was a good chance, depending on the kind of measurement, 
that they would end up with a number of candidate events well above the background; 
or that the physical quantity of interest was well above the experimental resolution; or 
that a certain rate would be in the region of sensitivity. One can show that the results 
obtained with reasonable prior distributions, chosen to model this positive attitude, are 
very similar to those obtainable by an improper uniform prior and, in particular, the 
upper/lower bounds obtained are very stable (see Sections 5.4.3 and 9.1.1 of Ref. [pO|] ). 



Let us apply this idea to the exercise we are dealing with: 0, 1 or 5 events observed 
over a background of 1 event (Fig. |l]). Searching for a rare process with a detector having a 
background of 1 event/month, for an exposure time of one month, a positive attitude would 
be to think that signal rates of several events per month are rather possible. On the other 
hand, the fact that the process is considered to be rare implies that one does not expect 
a very large rate (i.e. large rates would contradict previous experimental information), 
and also that there is some belief that the rate could be very small, virtually zero. Let us 
assume that the researchers are almost sure that the rate is below 30 events/month. We 
can consider, as examples, the following prior distributions. 
— A uniform distribution between and 30: 



Mr) = 1/30 



A triangular distribution: 

Mr) 



450 



(30 - r) 



(0 < r < 30). 



(0 < r < 30). 



(12) 



(13) 



A half-Gaussian distribution of = 10 



fo{r) 



'2n Or 



exp 



2al 



(r > 0). 



(14) 



The last two functions model the fact that researchers might believe that small values of 
r are more possible than high values, as is often the case. Moreover, the half-Gaussian 
distribution also models the more realistic belief that rates above 30 events/month are not 
excluded, although they are considered very unlikely.0 The three priors are shown in the 
upper plot of Fig. H. The resulting final distributions are shown in the lower plot of the 



We will see in Section 7.2 that realistic priors can be roughly modelled by a log- normal distribution. 
With parameters chosen to describe the positive attitude we are considering, this distribution would 
give results practically equivalent to the three priors we are using now. 
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same figure. The three solutions are practically indistinguishable, and, in particular, very 
similar to the results obtained by an improper uniform distribution (Fig. |I]). This suggests 
that the improper uniform prior represents a practical and easy way of representing the 
prior specification for this kind of problem if one assumes what we have called the posi- 
tive attitude of the researchers. Therefore, this prior could represent a way of reporting 
conventional probabilistic results, if one is aware of the limits of the convention. Seeking 
a truly objective probabilistic result — we like to stress the concept again — is a dream. 
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Figure 4: The upper plot shows some reasonable priors reflecting the positive attitude of re- 
searchers: uniform distribution (continuous); triangular distribution (dashed); half-Gaussian dis- 
tribution (dotted). The lower plot shows how the results of Fig. ^, obtained starting from an 
improper uniform distribution, (do not) change if, instead, the priors of the upper plot are used. 
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6 Prior-free presentation of the experimental evidence 

At this point, we want to reassure the reader (who we imagine at this point to be 
swirling around in the stormy sea of subjectivism) that it is possible to present data in an 
'objective' way, on the condition that all thoughts of providing probabilistic results about 
the measurand are suspended. 

Let us take again Bayes' theorem (Eq. (^), which we rewrite here in terms of the 
uncertain quantities of interest 

/(r I ric, n) oc /(nc | r, rf,) ■ /o(r) , (15) 



and consider only two possible values of r, let them be ri and r2. From ([15|) it follows 
that 

/(^c|ri,rb) /o(ri) 



fin 


nc, rb) 


fin 


ric, rb) 



finc\r2,n) /o(r2) ' 



(16) 



Bayes factor 

This is a common way of rewriting the result of the Bayesian inference for a couple of 
hypotheses, keeping the contributions due to the experimental evidence and to the prior 
knowledge separate. The ratio of likelihoods is known as the Bayes factor and it quantifies 
the ratio of evidence provided by the data in favour of either hypothesis. The Bayes factor 
is considered to be practically objective because likelihoods (i.e. probabilistic description 
of the detector response) are usually much less critical than priors about the physics 
quantity of interest .p"^ 

The Bayes factor can be extended to a continuous set of hypotheses r, considering 
a function which gives the Bayes factor of each value of r with respect to a reference value 
rjiEF- The reference value could be arbitrary, but for our problem the choice rpiEF = 0, 
giving 

nir]nc,rb) = — — ■ r, 17 

/(nc r = 0,rb) 



is very convenient for comparing and combining the experimental results |^6|, Q . The 
function TZ has nice intuitive interpretations which can be highlighted by reordering the 
terms of (p!6| ) in the form 

fir\nc,rb) //(r = 1 nc, r^) f{nc\r,n) , . , . 

T7~\ — / 77 n\ — =77 — \ n — ^=T^innc,rb) 18 

foir) I /o(r = 0) /(nc|r = 0,r6) 

(valid for all possible a priori r values). 7^ has the probabilistic interpretation of relative 
belief updating ratio, or the geometrical interpretation of shape distortion function of the 
probability density function. 7^ goes to 1 for r — *■ 0, i.e. in the asymptotic region in which 
the experimental sensitivity is lost: As long as it is 1, the shape of the p.d.f. (and therefore 
the relative probabilities in that region) remains unchanged. Instead, in the limit 7?. — 
(for large r) the final p.d.f. vanishes, i.e. the beliefs go to zero no matter how strong 

Note that this assumption might be questionable in the sophisticated field of g.w. search. For example, 
the effects of local sources of noise in the detectors are not well understood. This is what makes a 
substantial difference between a single detector and a coincidence experiment. The likelihood function 
summarizes the best knowledge about the g.w. burst detection and identification, and about noise 
behaviour, the tail of which can be very critical. In a coincidence experiment the detailed knowledge 
of the background becomes uncritical, as the only relevant hypothesis which makes accidental coinci- 
dence described by a Poisson distribution is the stationarity of the experimental conditions over the 
considered observation time. 
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Figure 5: Relative belief updating ratio TZ for the Poisson intensity parameter r for the cases of 
Fig. in 



they were before. In the case of the Poisson process we are considering, the relative behef 
updating factor becomes 



n{r;n,,n,T) = e-'' 1 



r 



(19) 



with the condition]^ > if ric > 0. 

Figure ^ shows the TZ function for the numerical examples considered above. The 
abscissa has been drawn in a log scale to make it clear that several orders of magnitude are 
involved. These curves transmit the result of the experiment immediately and intuitively: 

— whatever one's beliefs on r were before the data, these curves show how one mustP^ 
change them; 

— the beliefs one had for rates far above 20 events/month are killed by the experi- 
mental result; 

— if one believed strongly that the rate had to be below 0.1 events/month, the data 
are irrelevant; 

— the case in which no candidate events have been observed gives the strongest 
constraint on the rate r; 

— the case of five candidate events over an expected background of one produces a 
peak of 7?. which corroborates the beliefs around 4 events/month only if there were 
sizable prior beliefs in that region. 

Moreover there are some technical advantages in reporting the TZ function as a result of 
a search experiment. 

— One deals with numerical values which can differ from unity only by a few orders 
of magnitude in the region of interest, while the values of the likelihood can be 

^^"^ The case ri, — ric — yields Tl{r) — e^*", obtainable starting directly from Eq. (|T7|), defining TZ, 
and from Eq. (|^), giving the likelihood. Also the case — > oo has to be evaluated directly from the 
definition of TZ and from the likelihood, yielding 7?. = f Vr; finally, the case rf, = and Uc > makes 
r = impossible, thus prompting a claim for discovery - and it no longer makes sense for the TZ 
function defined above to have that nice asymptotic behaviour in the insensitivity region. 
It really is a 'must' and not a 'suggestion'. In fact, although probabilities may depend on individuals 
('subjective'), the way they are updated follows from standard logic (yielding Bayes' theorem) and 
thus is 'objective'. 
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extremely low. For this reason, the comparison between different results given by 
the TZ function can be perceived better than if these results were published in 
terms of likelihood. 

— Since TZ differs from the likelihood only by a factor, it can be used directly in 
Bayes' theorem, which does not depend on constants, whenever probabilistic con- 
siderations are needed.|^ In fact, 

/(r I ncTfe) oc 7^(r; rac,rb) ■ /o(r) . (20) 

— The combination of different independent results on the samef^ quantity r can be 
done straightforwardly by multiplying individual TZ functions: 

7^(r; all data) = ^^7^(r; data^) . (21) 

— Finally, one does not need to decide a priori if one wants to make a 'discovery' 
or an 'upper limit' analysis as conventional statistics teaches (see e.g. criticisms 
in Ref. |^): the TZ function represents the most unbiased way of presenting the 
results and everyone can draw their own conclusions. 



7 A case study based on realistic detector performances 
7.1 Prior- free results 

We now give a numerical example which uses the realistic parameters of an actual 
g.w. antenna and simulates possible experimental outcomes that g.w. researchers could 
be faced with. One of the best performances, in terms of sensitivity and duty cycle, was 
obtained with the Explorer antenna in 1991 p6|, The antenna worked with a duty 
cycle of 67% for ~ 180 days, i.e. 122 effective days, at a noise level of ~ 8 mK (that is 
in terms of signal amplitude h = 7 ■ 10~^^). At the chosen threshold, h = 2.5 ■ 10^^^, the 
event rate was roughly 100 events/day. We shall take this as the reference value of the 
background rate for our numerical examples. 

We now imagine a coincidence analysis between two antennae having the 1991 
Explorer characteristics, parallel to each other, far enough apart not to be sensitive to the 
same local effects and being operated for 1000 days. We consider here a fixed window of 



0.2 s (see e.g. Refs. and |33|), yielding an expected number of accidental coincidences 
of Th = 100 X 100 X 0.2/86400 = 0.02 events/day. The expected number of accidental 
coincidences is, then, E[nc|rb,T] = r^T = = 20 (the product r^T will be indicated 
hereafter by A;,). Let us consider the following numbers of observed coincidences: = 10, 
15, 20, 24, 29, 33, 38, roughly corresponding to a difference between observation and 
background expectation ranging from —2 to +4 standard deviations [a{nc \ \b) = V^]- 
The corresponding TZ values are shown in Fig. ^. We see that all results exclude rate values 
above ~ 0.1 bursts/day, while the experiment loses sensitivity below ~ 0.001 bursts/day. 
In the case of excess of observed coincidences above the background expectation {ric > Xb), 
the TZ function has a peak at = Uc/T — r^, with a peak value of TZm = G~^""{nc/ Xb)^" ■ 
The peak rises very rapidly with ric. For example, for ric = 38 the peak value is roughly 
600, at Vm = 1.8 ■ 10"^ bursts/day. 



Note that, although it is important to present prior-free resuhs, at a certain moment a probabihty 
assessment about r can be important, for example, in forming one's own idea about the most likely 
range of r, or in taking decisions about planning and financing of future experiments. 
See comments about the choice of the energy threshold in Section 10.2. 
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Figure 6: Belief updating ratios on a log- log scale for different observations. The abscissa shows 
the signal rate, in events per day. The continuous curve correspond to an observation equal to the 
background, the other curves to a difference between observation and background expectation 
ranging from —2 to +4 standard deviations. The grey curve (+4 st. dev.) is the case that will 
be studied in more detail (see Figs. 0, ^ and 

7.2 Turning the results into probabilities 

A peak value of 600 might seem impressive, especially if 'advertised' on a proper 
scale (in linear scale the plateau level TZ = 1 will be confused with 0, and the curve 
f{r\nc = 38) will appear very well separated from r = 0), and could easily convince 
non-experts that the searched- for signal exists. Nevertheless, confronted to such a result, 
there could be experts with strong physically motivated priors who would still maintain 
their scepticism, while others would hesitate. The reason is that in this domain of research 
prior knowledge is largely non-intersubjective. Even researchers who are members of the 
same experimental team do not usually share the same opinion, and the case of 38 events 
over an expectation of 20 is typical of those cases over which there could be disagree- 
ment: disagreement which could cause the result to be left unpublished for years, unless 
a charismatic and optimistic team spokesman persuaded his fellows to claim a discovery. 

It is interesting to use Bayes' theorem in a reversed mode to understand which kind 
of prior produces sceptical, hesitant and optimistic reactions. (We assume that researchers 
act in good faith and that they care about their reputation.) Above we have met two 
classes of priors: the uniform in r and the uniform in logr. One can easily imagine their 
effects, on the basis of the discussion concerning the example in Section |^ (see Figs. |l]-0). 
But we do not think that there is a single physicist whose prior beliefs correspond exactly 
to those of Eq. (|1^) or (pHI). It is much more reasonable to expect that someone would 
have a rough idea of the order of magnitude of g.w. burst rate, provided that they exist at 
all. Now, an easy way to model an uncertain order of magnitude is to think of a normal 
distribution in logr, with most of the probability mass concentrated in some decades. 
The corresponding distribution of r is called lognormal. Although this parametrization 
is, like any other, rough, it has some formal advantages which somehow reflect the prior 
knowledge of researchers: varying the two parameters of the distribution, one can choose 
the orders of magnitude of the value where the beliefs are concentrated; the probability 
density function goes to zero as r — > (in agreement with the working hypothesis that 
the searched-for signal does exist, and hence at a non-null rate); the probability density 
function is defined for all positive values of r, thus capable of persuading even initially 
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Table 1: Dependence of initial and final probability for r > 0.01 bursts/day as a function of the 
prior parameters (see text). 



Reaction 


Prior parameters 


P(r > O.Olburst/day) 


sceptical 
hesitant 
optimistic 


E[logio(r)] a(logio(r)) 
-4 0.5 
-3 0.5 
-2 0.5 


prior final 
3.2-10-5 0.9% 
2.3 % 50 % 

50 % 85 % 



very sceptical people to change their mind as soon as the accumulated evidence starts to 
produce a narrow peak in TZ. When this situation of strong evidence is achieved, scientific 
conclusions (summarized in the final probability density function) will not depend on the 
details of the priors: All researchers will agree on the interpretation and the result will be 
considered objective (although, we repeat, it is only intersubjective). 

Considering the situation of 38 events observed over an expected background of 
20 events, the prior knowledge corresponding to the subsequent sceptical, hesitant and 
optimistic reactions can be modelled with a Gaussian in Ir = logiQ{r) having standard 
deviation 0.5 (i.e. half a decade) and averages of —4, —3 and —2. Table |I| gives the 
parameters of the three priors, as well as the probabilities that r is above 0.01 bursts/day 
before and after the new experimental data. Figures |^ and ^ show the modelled priors and 
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Figure 7: Pessimistic (continuous), optimistic (dotted) priors plotted in different scales [Ir stands 
for log]^Q(r)]. The dashed line represents an intermediate situation. The grey curve is the TZ 
function for 38 observed events out of 20 expected. 
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the corresponding final distributions. The figures are drawn with several scales because 
each way of representing them can help one to get a feeling of what is going on. 




Figure 8: Several representations of the final distributions resulting from the three different 
priors of Fig. and the evidence from 38 observed events out of 20 events expected from back- 
ground (grey curve of Fig. ^). Note that the / stands for the generic symbol of p.d.f., but its 
mathematical function depends on the variable via the Jacobian. 
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7.3 Evolution of beliefs 

We conclude this section witli some remarks about tlie clioice of tlie priors used to 
illustrate this situation. First, it is clear that the chosen model is important, but what we 
want to show is the rough distribution of beliefs. For example, although different param- 
eters of the lognormal, or a different function of the probability, will produce numerical 
variations in the probability of the sceptical reaction, the qualitative conclusions will not 
change: Nobody possesses a psychological perception of probability which enables them 
to tell exactly whether their intuitive probability is 0.5, 1 or 2%. What matters is that 
these probabilities are perceived as rather low and well separated from the range which 
characterizes hesitation («i 40-60%) or almost certainty 90-95%). Second, although 
we are not going to enter into the detail of trying to explain why the three different 
researchers have such different priors, it is important to understand that, since we have 
in mind real researchers, priors are not simply abstract, aesthetic or philosophical ideas 
about the physical quantity. They summarize a complex prior knowledge, based on pre- 
vious experimental observations as well as on theoretical ideas related to this and other 
observables (we shall come back to this point in Section ^. For example, looking at the 
numbers in Table and the plots in Fig. ^ one could easily imagine that if the sceptical 
person was faced for a second time with independent evidence of similar strength to that 
provided by the 38 observed events (which could again come from antenna data, but also 
from other astrophysical information), he/she could be now in a situation similar to that 
of the hesitant person and the next time could be in the situation of the optimistic per- 
son. This evolution is illustrated in Fig. ^ which shows the p.d.f of the initially sceptical 
researcher as he/she is faced four consecutive times with such rather strong (and quantita- 
tively identical - clearly an academic exercise) evidence. After the fourth occasion, he/she 
will be strongly convinced that the rate is well above 10~^ bursts/day (see dotted curve 
of Fig. P). Asymptotically, when a large number of pieces of evidence are in hand, the 
behefs will be concentrated around the peak of the likelihood (i.e. 1.8 ■ 10^^ bursts/day), 
as the prior distribution becomes irrelevant .Q 

8 Reporting a result with an upper/lower bound 

Although the TZ function (which, we repeat, contains the same information as the 
likelihood function, but has the practical advantages we have illustrated) represents the 
most complete and unbiased way of reporting the result, it might also be convenient to 
express with just one number the result of a search which is considered by the researchers 
to be unfruitful. Before trying to give some recommendations, it is important to start by 
saying that any attempt to find a precise prescription, with the hope that a single number 
will summarize the experimental information completely, is a false endeavour. Nowadays 
it is not difficult, in fact, to provide the complete function TZ, parametrized in some way, 
no matter how complicated TZ might be. This parametrization could be posted on a web 
page, or sent on request, if it was too voluminous for a published paper, as might be the 
case for multidimensional problems, or if several TZ functions are obtained, depending on 

Given the rough modeUing of the sceptical prior of this numerical example, one needs about 20 
exposures to evidence of the kind considered, before the barycentre of the final distribution reaches 
the simulated 'true value' of 1.8 • 10^^ bursts/day. This is due to the rapidly decreasing tail of the 
log-normal distribution. It seems to us that, outside the order of magnitude considered more probable, 
the intuitive priors of experienced physicists for this kind of frontier physics quantity have flatter tails. 
As a consequence, once the final distribution has moved from the decades that one believed to be 
more probable, the convergence to the true value becomes faster. 
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Figure 9: Evolution of tlie p.d.f of a sceptical person (the grey curve is his/her initial prior) 
updated four times by independent evidence characterized by the same TZ function provided by 
38 observed events over an expected background of 20. Top and bottom plots differ only by the 
scales. 



different assumptions about systematic effects or about the underlying phenomenology.!^ 
If, anyway, one wants to report the result considered inconclusive as a single num- 
beif^ having the meaning of a bound. Fig. ^ suggests that this number should be in the 
region of r where TZ has a transition from 1 to 0. This value would then delimit (although 
roughly) the region in which the r values are most likely to be excluded from the region 
in which it is most likely that the true value lies. One may take, for example, the value 
of r for which TZ(r) = 5 % or 1 % of the insensitivity plateauQ value (i.e. 0.05 or 0.01), 

For example, the ZEUS Collaboration has recently published polynomial parametrizations of log- 
likelihoods, which contain the same amount of information of TZ, for each possible coupling of new 
contact interactions between electrons and quarks . 

^"^^ As is well known, in the case that TZ depends on two parameters, like in neutrino oscillation analyses, 
one obtains a contour plot. 

^^"^ One could choose, alternatively, a value corresponding to a percentage of the maximum of the like- 
lihood, and hence of TZ. As a convention, this could work as well, but: a) there is a problem of how 
to handle local maxima of the likelihood in cases more complicated than the one under study; 6j if 
the maximum of TZ is high enough, the TZ function corresponding to the bound could be larger than 
1. Note that, in the case of local maxima and minima of TZ, the condition TZ{r) = 5% or 1% yields 
multiple solutions. In this case it seems to us natural to choose the one farthest from the insensitivity 
region. 



23 



or any other conventional number. What is important is not to call this value a bound 
at a given probability level (or at a given confidence level - the perception of the result 
by the user will be the same! [0]). This would be incorrect. In fact the TZ function is not 
sufficient by itself for assessing a probabilistic statement about the quantity of interest. 

If we had to suggest a possible convention for the upper bound, it would be to 
choose tl such that TZ^ri) = 5%. The advantage of this convention is that it is easy 
to recover standard limitsf^ based on the rule "upper limit equals 3" (divided by the 
observation time) when no events are observed. In this case, in fact, the likelihood is 
/(n^ = I r) = e~^'^, and also Tl{r ; = 0) = e~^^. Moreover, a standard Bayesian 
inference with fair) = k (see discussion in Section p75|) produces the samep^ upper bound 
[Eq. (I)]. 

We end this discussion about summarizing the objective result provided by the TZ 
function in one number with a few cautionary remarks. First, we repeat that this result 
should not be called a 95 % confidence level upper bound. Second, as can easily be seen 
from Fig. ^ we do not think that it is worthwhile trying to define a conventional limit 
with the precision of a percentage level. Even defining the limit precisely, the expected 
statistical fluctuations of results from one experiment to another can easily change by 
~ 50 %. Therefore, if one really wants to quote a number for the upper limit, together 
with the IZ function, one should simply state the order of magnitude^ oi obtained, 
for example, with the TZ = 5% convention. 

Finally, if one is interested in a limit having a probabilistic statement, one has to 
pass through the priors. In this case a possible way of producing conventional probabilistic 
limits would be to use a uniform distribution, as discussed in Section ^.5| . Upper bounds 
calculated as 95 % probability upper limits for the results of Fig. ^ are given in Table 
and are compared with the bounds obtained using the 7Z = 5% rule. We see that the 
results are very similar, if we remember that high accuracy in these bounds is not needed, 
as discussed above.p^ 

Table 2: Comparison of the upper bounds obtained on the rate r using the TZ = 5% rule, or 
evaluated as a 95% probability upper limit given by a Bayesian inference with uniform priors, 
with reference to the example of Section [7.1| (see Fig. |6|). The values of the upper bounds are 
rounded to remember that we do not consider their exact value to be relevant (more digits are 
given within parentheses). 



Observed number of events 


10 


15 


20 


24 


29 


33 


38 


95% prob. (lO-^evts/day) 


0.5 


0.7 


1.1 


1.4 


2 


2 


3 




(0.51) 


(0.73) 


{1.06) 


{1.43) 


{1.96) 


{2.41) 


{2.98) 


7^ = 5 % rule (lO'^ evts/day) 


0.5 


0.8 


1.3 


2 


3 


4 


5 




(0.54) 


(0.81) 


{1.30) 


{1.91) 


{2.90) 


{3.83) 


{5.13) 



In this particular case the coincidence of the result obtained by the frequentistic prescription, the 
^(^l) = 5% convention, and the standard Bayesian result is due to a numerical effect due to the 
particular likelihood. 

^•^^ In the most general case, the 95 % probability bound limit will be different from that obtained by the 
TZ = b% convention, and this is all right as the meaning of the two bounds is different (see previous 
footnote). 

For a discussion about the significant digits of limits, see Sections 9.1.4 and 9.3.5 of Ref. [ pO[ . 
Another argument to understand our point is that the upper/lower limits should be considered to 
belong to the same category of uncertainties, and not of true values. 
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9 Does the searched-for process exist? 

We have seen how to make the inference about the g.w. burst rate r, under the 
assumption that g.w. bursts exist. At this point some readers might have the objection 
that they are not interested in the values of the rate, but rather in whether g.w. bursts 
exist at all. 

It is quite well understood by scientists and philosophers that, while the observa- 
tion of a phenomenon proves its existence, non-observation does not prove non-existence 
(the classic example that philosophers like for this reasoning is that of the black swan). 
In our case, the problem is complicated by the fact that even the observations are not 
certain proof of the existence. This is because, as long as some background events are 
expected, we cannot be absolutely (mathematically) sure that the searched-for signal has 
been observed, no matter how many events are observed above those statistically expected 
from background alone. This argument concerns not only the frontier problems we are 
dealing with in this paper, but all theoretical concepts, including true values of physi- 
cal quantities. So, to speak rigorously, we should only talk about beliefs. "Nevertheless, 
physics is objective, or at least that part of it that is at present well established, if we 
mean by 'objective' that a rational individual cannot avoid believing it. ... The reason is 
that, after centuries of experimentation, theoretical work and successful predictions, there 
is such a consistent network of beliefs, that it has acquired the status of an objective con- 
struction: one cannot mistrust one of the elements of the network without contradicting 
many others. Around this solid core of objective knowledge there are fuzzy borders which 
correspond to areas of present investigations, where the level of intersubjectivity is still 
very low." ||2^ As a consequence, it is not a question of proving or disproving something 
(unless some impossible consequences have been observed), but rather of how difficult it 
is to insert /remove something in/from what is considered to be the most likely network 
of beliefs. 

Applying these considerations to the case study, the answer to the question whether 
or not g.w. bursts exist involves a complex knowledge of astrophysical and cosmological 
facts and theories. As a consequence, we tend to believe that they could exist until the 
experimental evidence is such that even the lowest conceivable rates of bursts carrying 
enough energy to pass the effective energy threshold, evaluated by the best of our knowl- 
edge, are ruled out. On the other hand, we tend to believe that they have really been 
observed experimentally when energy, rates and shapes of the signals match with the rest 
of the knowledge. 



10 Dependence of the g.w. burst result on some systematic effects 

This last section is dedicated to systematic effects on the result. Every experiment 
belonging to the class of inference that we are treating in this paper has its own problem- 
atics. We consider here only some of the effects which are more relevant for the case study 
we are dealing with, although some of the problems are common to other experiments. 
As a general recommendation, we find very helpful the detailed study of the sources of 
uncertainty, as e.g. listed in the ISO Guide [|^, and the use of conditional probability. For 
a general scheme for the evaluation of uncertainties due to systematic effects, see Section 
2.10.3 of Ref. 0. 

23) Pqj. example, if one observed a very high rate of very energetic bursts, which seems incompatible with 
the possible sources in the Universe, most physicists would tend to believe that there was something 
wrong with the experiment. 
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Returning to the inference of the g.w. burst rate, we have considered so far a 
case study performed using reahstic parameters, but we have assumed ideal conditions 
concerning some aspects of the coincidence experiment. We will see below how the analysis 
strategy changes if we assume that the background is not perfectly known, or if it is not 
stationary; or if the physics process is not stationary. Before going into a discussion of 
these effects we need to consider the uncertainty about the optimal coincidence window 
and about the minimum g.w. energy to which the coincidence experiments are sensitive. 

10.1 Choice of the coincidence window 

The coincidence window should be set considering the physical behaviour of the 
sources, the distance between the detectors and the detector characteristics, i.e. the limited 
resolution introduced by the sampling time. But other effects can influence the choice, such 
as the noise that distorts the events or the fact that real signals may have unexpected 
shapes. So, in practice, the optimal coincidence window is usually chosen in order to 
maximize SNR, as a compromise between the demands for a reduced rate of accidental 
background on the one hand and for not missing physical events on the other. Therefore, 
the choice of the coincidence window involves unavoidably some arbitrariness, and several 
values of the window can be envisaged, to cope with the possible assumptions about the 
signals looked for. As a conclusion, we do not think that there is just one way of reporting 
results, and the TZ values corresponding to different reasonable choices should be presented 
separately. 

10.2 Uncertainty on the minimum energy of g.w. bursts 

At this point, we need to define more precisely the quantity r which is the subject of 
the measurement (the measurand). In fact, the case study assumed two parallel detectors 
responding to the same physical event which produced the burst of g.w.'s irradiating the 
Earth. This implies that r is the rate of g.w. bursts with energy greater than the highest 
energy threshold^ of the two detectors. Calling the differential energy spectrum of g.w. 
bursts (p{E) (= dr/dE), we have 



which states that r does, indeed, depend on the minimal energy -Emm required for the 
burst to be detected. This minimal energy is related to the maximum of the two threshold 
energies (E^^^). Obviously, one will be interested in measuring the detailed 4>{E), when 
the high sensitivity of future detectors will allow measurement of many burst candidates 
for different energy thresholds. 

It is easy to understand that the definition of the measurand given by Eq. ( P^ 
does not correspond to what is actually detected. In fact there is not a one-to-one corre- 
spondence between the energy resulting from the filter [Em) and the energy of the burst. 
This is true for all kinds of measurements, with the only difference being that in the case 
of g.w. detection the spread of E^ around the true energy E can be rather large, depend- 
ing on SNR. In fact, the intrinsic physical reason for this spread is noise: the measured 
energy of the detected signal depends on the randomness of size and phase of the noise 

^■^^ We clarify that when we talk about energy, we really refer to g.w. energy, and not to mechanical 
energy released to the antenna. 




(22) 
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at the moment of g.w. interaction B^, |50[. A further source of spread is the performance 



of the filter used in the analysis, as shown in Ref. |3J . 

This kind of problem can be partially solved, at the expense of a greater uncertainty 
about the measured quantity, if one is able to model the distortion of the spectrum (f){E) 
into (j){Em)- This can be done by mapping the transition probability E Em with 
a p.d.f. f{Em\E), which, in a discrete approximation, can be thought of as a transfer 
matrix. The knowledge of this transfer matrix then allows 0(-E'm) to be unfolded to infer 
4>{E)^^ However, unfolding g.w. burst energy spectra goes beyond the purposes of this 
paper. Therefore, hereafter it will be assumed that r corresponds to the definition of the 
measurand given in Eq. (|22|). 

10.3 Non-stationarity of the signal 

Another assumption which entered in our previous considerations is that the g.w. 
bursts have a constant rate during the observation time. This assumption is consistent with 
the present status of knowledge and it leads researchers to model the arrival time of the 
bursts with a Poisson process of constant intensity over the whole period of observation. 
Nevertheless, one can envisage analysing the data with the hope of finding evidence for 
g.w. bursts in a short period, perhaps triggered by other independent observations which 
happened in the same period.Q With this possibility in mind, it is preferable to analyse 
the data in subperiods, in order to exploit the potential of the information collected. The 
result over the full period of observation can be easily recovered by merging the partial 
results, as discussed in Section |^. It is easy to prove that, in fact, the result over the full 
period during which the noise has been stationary is exactly the same as can be evaluated 
by combining the subperiods, since 

n = UiUi = e-"^»^' f 1 + -j 

1 + ^) , (23, 

where T = ^ . Tj and Uc = Yli ■ For this reason, it is preferable to keep results on short 
periods of observations separate. 

10.4 Uncertainty on the value of the background rate 

We have assumed that expected background rate is well known, as can be cross- 
checked using the off-timing technique and estimation from individual background rates. 
Nevertheless, one may be in a state of uncertainty about r?,; for example if the observation 
time is very small at a given level of background (see discussion below concerning the non- 
stationarity of background). In this case r?, will be an uncertain number too and so will 

25) Pqj. examples of unfolding methods currently used in particle physics, when this kind of problem is 
encountered, see Refs. [Q, and |52|. An elementary introduction to the problem of unfolding 
methods, as well as of other simple methods, can be found in Ref. pil. More sophisticated methods 



for spectrum unfolding and, generally speaking, image reconstruction, are presented in Refs. |55 and 
respectively. 

Given the actual levels of background in present-generation detectors, it is very unlikely that one would 
be persuaded that a genuine train of g.w. bursts had really arrived in a short observation time, unless it 
happened to be a truly spectacular phenomenon. Nevertheless, one could 'gate' the candidate events 
by other pieces of evidence coming from independent sources of information concerning something 
that happened in a narrow time window. Such independent information could be related to optical, 
neutrino or 7-ray burst observations. 
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be characterized by a p.d.f. f{rb)- One then has a hkehhood f{nc \ r,rb) for each possible 
value of Tb- The likelihood which takes into account all possible values of r^, each weighted 
with its degree of belief /(r^), is obtained by the rules of probability, yielding 

fine I = y /(^c I ^' ^b) fin) dn . (24) 

The case of a well-known background rate (rf, = r^J is recovered when /(rf,) (5(r{, — rf,^), 
where 6{-) is the Dirac delta function. Note that f{nc \ r) will no longer be a Poisson distri- 
bution, and therefore the expression of the TZ function will also be more complicated than 
that of Eq. (|19]), although this is just a computational complication. This consideration 
leads to the prediction that the frequency distribution of random coincidences made over 
a long period of time, during which fluctuates, can look quite different from a Poisson 
distribution. 

A last remark concerns the meaning of /(r^,). This function is meant to describe 
the uncertainty about the exact value of rj, in a period which is considered to be stable to 
the best of our knowledge, and not the measured frequency distribution of the background 
rate during a long period. If one knows that different subperiods each had a different value 
of Tb (within the unavoidable uncertainty), this information must be used in a different 
way, as will be shown in the next section. 



10.5 Non-stationarity of the noise 

One of the most important and unavoidable problems in coincidence experiments 
is the non-stationarity of the noise. In fact, the chance of detecting a g.w. of minimum 
energy -Emm depends not only on the filter threshold but also on the level of noise. A high 
level of noise acts as a high effective threshold for the g.w. signals, and, therefore, the 
high rate of collected data with a filter threshold much lower than this effective threshold 
contains no useful information for the coincidence analysis. 

One might envisage two possible strategies for the data-taking of the individual 
antenna. 

— Fixed energy threshold, as, for example, used in Ref. [^. The threshold is fixed at 
a constant energy level independent of the detector noise. It follows that the event 
rate due to the background varies according to the detector noise level. However, 
as said above, this option does not imply that the rate of events due to g.w.'s is 
constant. 



— Varying energy threshold, as, for example, used in Ref. |]39|. The threshold level 
is given in terms of SNR, i.e. the energy of the threshold varies according to the 
detector noise. The event rate due to the background remains constant, while that 
due to the signals varies according to each threshold value (i.e. according to each 
sensitivity level of the detector). 
We are aware of the complexity of this problem, and a full treatment of it goes beyond 
the purpose of this paper. However, we think that some general considerations can be 
made. The basic observation is that, as far as possible, in the final analysis one should try 
to keep the definition of the measurand fixed, as discussed above. If this goal is achieved 
(within the unavoidable uncertainties), although only in subperiods, it easy to combine 
of the results referring to the same measurand by multiplying the TZ functions of each 
subperiod: 

1 + \ (25) 
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In contrast, results referring to different measurands, i.e. coincidences obtained at different 
effective thresholds, should be kept separate. 

As a consequence of the above considerations, the natural procedure seems to 
be somewhat in between the two strategies outlined above. During the data-taking it is 
preferable to vary the threshold setting in order to keep the SNR at its lowest possible value 
and thus maximize the chance of detecting g.w. events. The large amount of background 
events which are collected in this way can be reduced by applying a more sophisticated 
selection before using the g.w. burst candidates for the coincidence procedure. Then, at 
the moment of the final analysis, the data should be reorganized according to the effective 



threshold of the less sensitive antenna (see Section |10.2[ ). This is equivalent to performing 
many experiments at different effective thresholds, the result of each of which should be 
presented separately. 

At this point, we think that a very simple simulation could help to make our points 
clearer. Let us take a numerical example by considering the case of two periods over which 
the system was stationary, with effective energy threshold Ethi {i = 1,2). We take each 
period of = 1000 days, for a total observation time of 2000 daysj^ In each time interval 
the background rate is indicated by r^- and we simulate rithi S-'^- bursts, plus a number 
of random coincidences exactly equal to the expectation value. Although the situation is 
obviously oversimplified, we think that it should help to make the general considerations 
more easily comprehensible. 

10.5.1 Consequences of analysing together data taken at different effective thresholds 

Let us consider r^^ = 0.02 events/day, yielding 20 background events in Ti at 
the effective energy threshold Ethi ■ At this energy threshold we simulate 18 genuine g.w. 
bursts. The result of this simulation is therefore ric^ = 38 observed coincidences. 

During the period T2 of a more noisy situation the threshold has been properly 
raised, to Eth2 > Ethi 5 order to keep = 0.02 events/ day. Let us assume this result was 
obtained by doubling the threshold, i.e. Eth^/Eth^ = 2. On the other hand, the number 
of coincidences due to g.w. bursts chang certain fraction of the bursts will go 

below threshold. To get the order of magnitude of the effect, let us take the number of 
sources within the sensitivity volume increase as d^, where d is the maximum Earth-source 
distance reachable at the chosen threshold. Instead, the energy of g.w.'s released in the 
antenna goes like d~'^. Thus, the rate of observable bursts is the ratio of the two energy 
thresholds at the —3/2 power. Then, during T2 we get f« 20 + 18 X 2'^/^ « 26. 

The upper plot of Fig. ^ shows the situation: the dotted curve is the TZi, the 
black dashed curve is the 7I2 (its peak value is ~ 1.1 ). The peak of TZi and the peak of 
7^2 occur at different abscissae (r^i = 1.8 ■ 10~^ bursts/day, = 0.6 ■ 10"^ bursts/day), 
as expected due to the difference in the thresholds. Since the two periods refer to different 
measurands, the product TZi ■ IZ2 has no inferential meaning. Let us plot it, just to see 
what one would obtain by making improper use of the combination rule given by Eq. 
(|21|). Let us imagine also, for comparison, the full period analysed as a single coincidence 
experiment. The corresponding TZ function is indicated by TZav and is obtained using a 



total number of simulated coincidences ~ 40 + 18 + 6 = 64. Figure |T0| shows that 
TZi ■ IZ2 and TZav coincide, but this does not justify the use of TZavi as we know that 
TZi ■ 7^2 is wrong too. It is important to note that the position of the peak has moved 

These very long subperiods are chosen to give a sufficient number of coincidences when we consider 
only two data samples. Obviously, the same considerations hold if the subperiods have lengths of 
hours, as is more reasonable. 
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Figure 10: Effect of naive combination of data taken at different effective tliresholds. In botli 
plots tlie dotted curve is the TZ function of the data taken in the low-noise period (Ti). The 
black dashed curve is the TZ function of the data taken in the more noisy period (T2) such that 
the threshold had to be varied by a factor of 2 (upper plot) or 5 (lower plot). The continuous 
and the grey dashed lines (overlapping) represent TZi ■ IZ2 and IZav (see text). 



to a value strongly influenced by the data taken in the less sensitive period. The peak 
value is also reduced. Both these effects are a consequence of the mixing of different 
physics quantities. This effect can be shown more clearly by a new simulation in which 
the effective threshold during T2 is raised by a factor of 5 (bottom plot of Fig. |10|). While 
the first period contains quite strong evidence in favour of g.w. bursts of energy E > Eth^ 
and the second period provides a strong constraint for bursts of the higher energy Eth2i 
the incorrect combinations mix up the two pieces of information, effectively spoiling both 
individual results. 



10.5.2 Combination of data having the same effective threshold 

The results on the burst rate, with g.w.'s having a minimum energy -Ef^^, can only 
be obtained using the data collected during Ti. In contrast, information about g.w. bursts 
exceeding Eth2 can be obtained from both periods. The proper combination of the two 
pieces of information is achieved by selecting the subsample of events taken during Ti 
which have E > Eth2- The rate of background events exceeding Eth2 can be evaluated 
by taking an exponential law relating threshold and rate, obtained assuming a Gaussian 
noise for the amplitude |5^. We then obtain r^^ = r^^ g-^th2/-^"»i ^ 2.7- 10^'^. The number 
of observed events in our simplified simulation is therefore n'^ = 2.7 + 18 x 2~^/^ ^ 9. 
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Figure 11: Combination of data sets using the same effective energy thresliold. The dotted 
curves are the functions of the data taken in low noise period (Ti), selected to have an 
effective threshold equal to the data taken in the more noisy period (T2). The ratio of the 
selection threshold to the data taking threshold is a factor 2 (upper plot) and a factor 5 (lower 
plot). The continuous lines represent 7^i&2 = ^1 ■ i-e- the correct combination of the two 
datasets. The grey dashed lines represent, instead, the result obtained by a naive average of the 
two data sets. 

The 7?. functions relative to E > Eth2 for the two periods are plotted in the upper 
plot of Fig. 0: The dotted curve is TZi and the black dashed curve is 7^2- The peaks 
of TZi and 7^2 are now both at = 0.6 ■ 10~^ bursts/day, as the data refer to the 
same effective energy threshold. The combined result is obtained by multiplying the two 
partial results (i.e. 7li&^2 = T^i ■ ^^2) and is shown by the continuous line in Fig. |Tl|. The 
evidence achieved by the combination of the two results is much better than was obtained 
in the good (i.e. low-noise) period alone. This is a general result obtained in a natural 
way in the approach presented, and is in qualitative agreement with intuition. In fact, it 
is reasonable to think that, if data are analysed correctly, even a very noisy period, in 
which the detector is practically blind, should not spoil the evidence provided by the good 
period. The overall evidence should increase as long as new data sets each containing a 
bit of information are added to the analysis. The formalization of these considerations 
comes from the observation that TZ has a constant value of 1 for —>■ 00 [see footnote 
immediately after Eq. (|19|)]. 

The TZ function obtained by averaging the two periods, and indicated again by 
TZav, is now obtained using r^^^ = 1.13 ■ 10^^ events/day and ric = 35 coincidences. The 
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corresponding TZav function is shown, for comparison, by the grey dashed hne in Fig. [TT 



This way of combining the data is unjustified, as it is not derived from general rules of 
inference. Besides general arguments, the figure shows that the naive combination is less 
efficient at using at best the evidence provided by the two sets of data. As before, a new 
simulation in which the expected background rate during T2 is five times that during Ti, 
can illustrate more clearly this result. In fact, in the bottom plot of Fig. O, one can now 



see that the noisy period provides only a very small piece of evidence; nevertheless, the 
correct combination of the two periods takes advantage even of this very tiny piece of 
evidence, and the combined 7^i&2 has a peak slightly higher than TZi. One can see that 
the naive combination of the two periods, on the other hand, spoils the result obtained by 
the first period alone. This is obviously absurd: It is true that an infinitely noisy period 
brings no new information to the physical quantity of interest, but neither should it spoil 
the result achieved in the good period. 



11 Conclusions 

The problem of reporting the result about the intensity of a Poisson process at 
the limit of the detector sensitivity and in the presence of background has been analysed 
from the perspective of probabilistic inference. This approach assumes that probability is 
related to the status of uncertainty and its value classifies the plausibility of hypotheses 
in the light of all available knowledge. We consider this approach the most general one to 
draw probabilistic conclusions in conditions of uncertainty, which is always the case when 
we want to infer the value of a physics quantity from experimental observations. 

This approach is also known as Bayesian statistics because of the key role played 
by Bayes' theorem in updating probability in the light of new data. We have given argu- 
ments to show that Bayes' theorem is quite natural and produces results in qualitative 
agreement with intuition. That probabilistic conclusions depend also on priors is natural 
too, although their presence tends to produce uneasiness in the practitioners. This kind 
of 'priors anxiety' can be overcome if one understands their meaning and their role, which 
we have illustrated here with examples. 

We have shown that the contribution of the priors becomes irrelevant in routine 
cases, i.e. when the response of the detector is very narrow around the true value. However, 
in frontier- science measurements, priors become crucial; so crucial that it is preferable to 
refrain from providing probabilistic results. In this situation, the most objective way of 
reporting the result is to give directly likelihoods, or rescaled likelihoods in the form of 
relative belief updating ratios (Jl functions), described in this paper. The advantage of 
reporting TZ functions is that they are easily perceived and the combination of several 
experimental results can be achieved in the most efficient way. 

From the perspective illustrated in this paper, we consider a false problem that 
of finding a unique and objective prescription to calculate upper/lower limits (or contour 
curves, in the case of two-dimensional problems), which would summarize efficiently the 
result of the experiment and would allow a consistent combination of results. Nowadays 
it is easy to provide the complete TZ function, or several TZ functions, depending on 
assumptions with regard to systematic effects. Nevertheless, we understand that it can 
be practical to summarize the results with a number which roughly separates the region 
in which the experiment loses sensitivity (and TZ goes to 1) from the region practically 
ruled out by the data (7Z 0). This number can be based on a conventional value of 
the TZ function in the region of transition between 1 and 0. We have shown that similar 
numbers for the bounds can be obtained using a standard Bayesian inference which uses 
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a uniform prior. The upper/lower bounds calculated in this latter way can be interpreted 
as the probabilistic limits that would be evaluated by the researchers sharing a positive 
attitude toward the possibility of the planned search. 

The ideas illustrated in this paper have already been applied to combine all pieces 
of evidence able to constrain the Higgs boson mass [48| and to the analysis of deep- 



inelastic scattering events to search for new contact-type interactions between electrons 
and quarks [^6|, ^ . We have shown here that they are very useful in the analysis of grav- 
itational wave bursts in coincidence experiments. Indeed, the publication of the results in 
terms of TZ functions for signals above a well-defined effective threshold (within unavoid- 
able uncertainty) represents an efficient way of taking advantage of all possible pieces of 
evidence hidden in the data. 
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Note added 

We would like to bring the attention of the reader to an interesting report 
which appeared while the present paper was going through the final editing procedures. 
Although much of the effort of the author has been dedicated to "deduce correct confidence 
limits" , the report of Eitel shows for the first time (log-)likelihood functions of neutrino 
oscillation experiments as 3-D plots (Figs. 2, 5 and 13). Since offsets in log-likelihoods 
are equivalent to factors in likelihoods, these results can be easily reinterpreted with 
the language developed in our paper: The asymptotic insensitivity region corresponds 
to level 100 of Fig. 2 and level of Fig. 13 (unfortunately, level ~ 84 is out of scale 
in Fig. 5). Moreover, the similarity between the curves of Fig. 14 of Ref. |]5^ and the 
TZ functions of our paper is self-evident. Indeed, these curves transmit the experimental 
result immediately and intuitively. Comparing Figs. 13 and 6 (and then extrapolating 
to Fig. 5, where the 'flat ridge' is missing), one can realize how misleading the standard 
way of presenting neutrino oscillation results as spots in the {sin^ 29, Am^} plane can be. 
Figure 6 gives the impression that LSND rules out all parameter space outside the spots. 
However, Fig. 14 shows that LSND only rules out the parameter region which is also 
excluded by KARMEN. Most of the complementary region is the region of insensitivity. 
In the boundary between these two regions (where KARMEN has already lost sensitivity), 
there is certainly a spot where there is very high evidence (we assume no systematic effects 
have been overlooked), but this evidence cannot lead us to necessarely believe that the 
true values of sin^ 29 and Am^ are there, unless we have other reasons to believe it. 
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